introduction to rmarkdown

Anna Krystalli & Mike Croucher




ISBE 2016

‘Challenge for our generation: open, reproducible and reliable science’

3rd August 2016, University of Exeter

markdown .md

stripped down html


  • intended to be as easy-to-read and easy-to-write as possible.
  • intended for one purpose: to be used as a format for writing for the web.
  • syntax is very small, corresponding only to a very small subset of HTML tags.

focus on communicating & disseminating


  • formatting handled automatically
  • clean and legible across platforms and outputs

rmarkdown .Rmd

rmarkdown integrates:

a documentantion language (.md)

with:

a programming language (R)


enables literate programming

single document to integrate data analysis with textual representations, linking data, code, and text


outputs

it’s already everywhere!

Rmarkdown & reproducibility

Computational science has led to exciting new developments:

  • Technology is increasing data collection throughput; data are more complex and highdimensional
  • Existing databases can be merged to become bigger databases
  • Computing power allows more sophisticated analyses, even on “small” data
  • For every field “X” there is a “Computational X”

Increasing computational complexity of analyses:

has exposed limitations in our ability to evaluate published findings.

  • Even basic analyses difficult to describe

  • Errors more easily introduced into long analysis pipelines

  • Knowledge transfer is inhibited

  • Results are difficult to replicate or reproduce

  • Complicated analyses cannot be trusted

calls for reproducibility


Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

  • fully scripted analyses
  • make code and data available

reproducibility limitations

  • top down
  • downstream (post publication)
  • ultimately does not address the key question:

    can we trust these results?

evidence based science

evdence needs:

  • documenting
  • linking
  • communicating


rmarkdown can integrate tools, processes and outputs into evidence streams

at all stages of scientific process

simple tools:

low hanging fruit

  • begin at the start of the process
  • document & interlink evidence streams
  • explore and communicate!

empower your code and data

examples

report

code documentation

method collation

interactive documents

presentations

md basics

text

    normal text

normal text

    *italic text*

italic text

    **bold text**

bold text

    **bold italic text**

bold italic text

    superscript^2^

superscript2

    ~~strikethrough~~

strikethrough

headers

unordered lists

ordered lists

quotes & code

> this text will be quoted

this text will be quoted

`this text will appear as code` inline

this text will appear as code inline

a <- 10
    the value of parameter *a* is 10

the value of parameter a is 10

text formatting

images

    ![](https://www.rstudio.com/wp-content/uploads/2015/01/rmarkdown-cheatsheet-2-e1457627578814.png)
    
    ![](resources/cheat.png)
    

resize images

    <img src="resources/cheat.png" width="200px" />

basic tables

Table Header  | Second Header
------------- | -------------
Cell 1        | Cell 2
Cell 3        | Cell 4 
Table Header Second Header
Cell 1 Cell 2
Cell 3 Cell 4

online table to .md converter

.md resources

offical markdown documentation

Rmarkdown documentation

Rstudio Rmarkdown cheatsheet

github.io websites: eg Andy South’s blog

Reproducible Research coursera MOOC

Producing html documents from .R scripts using knitr::spin

chunks

R code chunks can be used as a means render R output into documents or to simply display code for illustration

options

for more details see http://yihui.name/knitr/

set default options

knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)

extras

knitr::kable() tables

require(knitr)
data(airquality)
kable(head(airquality), caption = "New York Air Quality Measurements")
New York Air Quality Measurements
Ozone Solar.R Wind Temp Month Day
41 190 7.4 67 5 1
36 118 8.0 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
28 NA 14.9 66 5 6

DT::kable() tables

require(DT)
data(airquality)
datatable(airquality, caption = "New York Air Quality Measurements")

plotly

library(plotly)

set.seed(100)
d <- diamonds[sample(nrow(diamonds), 1000), ]

p <- ggplot(data = d, aes(x = carat, y = price)) +
  geom_point(aes(text = paste("Clarity:", clarity)), size = 1) +
  geom_smooth(aes(colour = cut, fill = cut)) + facet_wrap(~ cut)

ggplotly(p)

shiny

outputs

rpubs

demo

Exercise

your mission

create your first .Rmd!

  • choose some data eg:
    • datasets package
    • data(package = .packages(all.available = TRUE))
  • show us some data in a table
  • plot some data
  • write a bit about what you did
  • publish it on rpubs. Add you link to our googledoc

see my example: